Automated Multi-purpose Text Processing 2 Previous and Current Research 3 a Theory of Syntactic Style 2
نویسنده
چکیده
Multiple versions of a document may need to be produced for diierent purposes and so may require quite diierent stylistic structures. This paper describes how we are building stylistic control into both natural language parsers (Pundit) and generators (Pen-man) to handle variations in hospital patient educational materials. Stylistic knowledge is formally represented in terms of a multi-level stylistic grammar which deenes a systematic correspondance between low-level syntactic structures and high-level stylistic eeects. 1 The importance of style in natural language processing The importance of dealing with stylistic aspects of language in computational systems is undeniable. People communicate a great deal of information through stylistic nuances, and a knowledge of how these subtleties innuence meaning is part of a full understanding of language. Systems that could analyze the eeects of style on communication would provide information about the implicit meaning that is contained in a text. And generation systems that could control style would produce text that intentionally conveys a speciic communicative eeect. Both stylistic analysis and generation could be used in applications , such as text critiquing, second-language instruction , and machine translation, for which understanding the eeects of how something is said is as important as understanding what is said. Ultimately , computational stylistics should be a part of any system that attempts to deal with`real-world' language. But very few natural language understanding systems have attempted to deal with issues of style, 1 1 By style, we do not mean literary style, but and those that do have generally taken a simplistic and heuristic approach. Stylistic analysis has not yet developed the systematic and rigorous methods of syntactic analysis and semantic interpretation. Part of the reason is obvious: understanding style is hard. Stylistic eeects are diicult to articulate and even more diicult to deene. In the past several years, we have worked to address the problems of syntactic style, understanding how particular syntactic structures can convey corresponding stylistic eeects. We are currently involved in a collaborative research project with Dietmar RR osner's group at the Forschungsinstitut f ur anwen-dungsorientierte Wissensverarbeitung (Ulm, Baden-W urttemberg) on \Multi-purpose text generation from knowledge bases." The two participating sub-projects are both concerned with the generation of multiple versions of natural language text from a single knowledge base. In their TechDoc project, RR osner's group is addressing issues of multilingual generation in the production of technical manuals. In our HealthDoc project, we are developing systems …
منابع مشابه
برچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملThe syntax of concealment: reliable methods for plain text information hiding
Many plain text information hiding techniques demand deep semantic processing, and so suffer in reliability. In contrast, syntactic processing is a more mature and reliable technology. Assuming a perfect parser, this paper evaluates a set of automated and reversible syntactic transforms that can hide information in plain text without changing the meaning or style of a document. A large represen...
متن کاملTreex - an open-source framework for natural language processing
The present paper describes Treex (formerly TectoMT), a multi-purpose open-source framework for developing Natural Language Processing applications. It facilitates the development by exploiting a wide range of software modules already integrated in Treex, such as tools for sentence segmentation, tokenization, morphological analysis, part-of-speech tagging, shallow and deep syntax parsing, named...
متن کاملExplanation in Computational Stylometry
Computational stylometry, as in authorship attribution or profiling, has a large potential for applications in diverse areas: literary science, forensics, language psychology, sociolinguistics, even medical diagnosis. Yet, many of the basic research questions of this field are not studied systematically or even at all. In this paper we will go into these problems, and suggest that a reinterpret...
متن کاملThe Interaction of Syntactic Theory and Computational Psycholinguistics
Typically, current research in psycholinguistics does not rely heavily on results from theoretical linguistics. In particular, most experimental work studying human sentence processing makes very straightforward assumptions about sentence structure; essentially only a simple context-free grammar is assumed. The main text book in psycholinguistics, for instance, mentions Minimalism in its chapte...
متن کامل